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[57] ABSTRACT 

A method and apparatus for compacting VLIW instructions 
in a processor having multiple functional units and including 
a buffer for storing compacted instructions, wherein NOP 
codes are eliminated from the compacted instruction and 
each compacted instruction includes words which contain an 
operation code directing the operation of one of the func- 
tional units, a dispersal code, and a delimiter code, wherein 
an alignment circuit parses each compacted instruction from 
the buffer based upon the delimiter codes of the words and 
aligns the compacted instruction in an alignment buffer and 
a dispersal circuit transfers each word of the compacted 
instruction stored in the alignment buffer into at least one 
operational field of a dispersed instruction buffer which 
stores an executable instruction having an operational field 
corresponding to each one of the functional units. Another 
embodiment is also shown which interleaves the bits of a 
buffer, alignment circuit, alignment buffer, dispersal circuit 
and dispersed instruction buffer to reduce the circuit area 
required for expanding the compacted instruction. 

9 Claims, 9 Drawing Sheets 
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METHOD FOR STORING AND DECODING 
INSTRUCTIONS FOR A MICROPROCESSOR 
HAVING A PLURALITY OF FUNCTION 
UNITS 

This is a continuation-in-part of co-pending, commonly 
assigned,, Scr. No. 08/767,450, filed Dec. 16. 1996 incorpo- 
rated herein by reference. 

BACKGROUND OF THE INVENTION 

This invention relates to computers which utilize wide 
instruction words to achieve instruction level parallelism 
and, more particularly, to methods and apparatus for storing 
wide instruction words in compressed form and for expand- 
ing the compressed instruction words for execution. 

One of the approaches to improving microprocessor per- 
formance is instruction level parallel processing. Instruction 
level parallel processing involves execution in parallel of 
low level machine operations, such as memory loads and 
stores, integer additions and floating point multiplications. 
Processors for implementing instruction level parallelism 
typically include multiple execution units and are controlled 
by Very Long Instruction Words (VLIW's). Each VLIW 
specifies the operations that are to be executed in a single 
cycle and includes multiple operation fields, alternatively 
referred to as syllables. The source program is typically 
written in a high level language without attention to opera- 
tions that can be performed in parallel. The conversion of a 
source program to machine code which utilizes instruction 
level parallelism involves scheduling of operations which 
can be executed in parallel. The scheduling function may be 
performed by a compiler or by the processor itself. When 
scheduling is performed by the processor, the processor 
hardware may become complex. When scheduling is per- 
formed by the compiler, the processor simply executes the 
operations contained in the VLIW. Instruction level parallel 
processing is described by J. A. Fisher et al in Science, 
Vol.253, Sep. 13, 1991, pp. 1233-1241 and by B. 
Ramakrishna et al in the Journal of Supercdmpuling, Vol.7, 
1993, pp.9-50. 

For maximum utilization of a processor having multiple 
execution units, each execution unit should perform an 
operation on every processor cycle. The execution units of 
the processor may be fully utilized during computation- 
intensive portions of a program. In this case, all or nearly all 
of the operation fields, or syllables, of the VLIW arc filled. 
Other portions of the program may not require all of the 
resources of the processor. In this case, some of the execu- 
tion units are idle, and one or more operation fields of the 
VLIW are filled with a no operation (NOP) code. 

FIG. 1 illustrates an example of an instruction word 10 
containing syllables S1-S6 which, in turn, contain operation 
codes for functional units F1-F6 respectively. In the 
example illustrated, functional units F2 and F4 are not 
needed to execute instruction word 10 and therefore contain 
NOP codes. 

The number of NOPs in a program may be significant. 
Storing instruction words with significant numbers of NOPs 
in memory is wasteful of memory space. To avoid inefficient 
use of memory, techniques for storing wide instruction 
words in compressed format have been proposed. 

In one conventional approach, compressed instructions 
are stored with a mask word. The operation syllables of the 
compressed instruction are stored in consecutive memory 
locations, or words. The mask word encodes where the 
operation syllables are inserted in the expanded instruction. 
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The remaining syllables of the expanded instruction are 
filled with NOP codes. Since the mask word is normally only 
a few bits wide, two or more mask words can be grouped in 
the same memory word. This approach is illustrated in FIG. 

5 2. An instruction word pair is stored in compressed format 
in memory as a mask word 20 followed in consecutive 
memory locations by operations W00, W02, WOS, W06, and 
W07 of a first instruction word and operations W12 and 
W14 of a second instruction word. A mask field 22 in mask 
word 20 indicates the locations of the operations WOO, W02, 
WOS, W06 and W07 in a first line 34 of instruction cache 24, 
and mask field 26 indicates the positions of operations W12 
and W14 in a second line 36 of instruction cache 24. 

Due to the variable length of the compressed instruction 
format in memory, it is necessary to record the oflset to the 

15 next instruction address somewhere in the instruction itself. 
The offset must also be stored in the instruction cache to be 
able to execute correct program counter sequencing and to 
maintain coherency between the program counter and the 
main memory code image. The oflset to the next instruction 

20 address can be stored in mask word 20 as fields 30 and 32 
and can be stored in instruction cache 24 as fields 38 and 40. 
An instruction compression and expansion technique similar 
to that shown in FIG. 2 and described above is disclosed in 
U.S. Pat. No. 5,057,837 issued Oct. 15, 1991 to Colwell et 

25 al. and U.S. Pat. No. 5,179,680 issued Jan. 12, 1993 to 
Colwell et al. The major disadvantage of using the technique 
shown in FIG. 2 and described above is that consecutive 
instructions do not correspond to consecutive instruction 
cache locations, as they are separated by an address differ- 

30 ence that depends on the variable length of the instruction. 
This introduces an artificial alias for instructions that are 
physically separated by a distance that is smaller than the 
instruction cache size. For example, in a 1024 line instruc- 
tion cache, a code section of 1024 instructions will very 

35 likely contain aliases to the same cache locations, unless 
proper padding is performed by the loader. This padding is 
possible only if empty spaces are left in main memory. In the 
example of FIG. 2, instruction pair #n occupies a cache hole 
left by the previous instructions. To achieve this, the assem- 

40 bier is forced to leave empty memory areas to get to the 
desired address of the cache hole. In the example of FIG. 2, 
twelve memory words are wasted to avoid a conflicting 
address for instruction pair #m. 

In summary, the technique shown in FIG. 2 and described 

45 above has several disadvantages. The instruction cache must 
have a larger capacity to store the offset to the next instruc- 
tion address. Program counter sequencing is complicated 
because it needs to compute the next instruction addresses. 
Also, the variable instruction length introduces artificial 

50 aliases in the instruction cache. And, if the loader pads 
instructions in main memory to avoid the problem of arti- 
ficial aliases, holes are created in main memory. 

In addition, the scheme of FIG. 2 requires the allocation 
of a fixed number of bits for the bit mask, which can lead to 

55 high overhead when there are only a few syllables in the 
instruction which are not NOPs. This scheme also requires 
hardware for dispersal of the instruction that occupies a large 
circuit area and is not easily amenable to bit interleaving. 
Accordingly, a need remains for the storage of VUWs in 

60 a compacted format wherein at least a portion of the instruc- 
tion syllables containing NOPs are eliminated and wherein 
the compacted instructions are stored sequentially in instruc- 
tion memory. 

65 SUMMARY OF THE INVENTION 

The present invention includes an instruction encoding 
method to reduce or eliminate NOPs in VLIW instructions. 
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This method for storing and decoding instructions for a FIG. 5 is a schematic diagram illustrating the format of 

microprocessor involves identifying each word of an sequential compacted instructions according to the present 

instruction that does not contain a NOP code, generating a invention in the instruction memory of FIG. 3. 

dispersal code for each identified word, where the dispersal FIG 6 ^ a block diagram of an embodiment of the 

code corresponds to a field of the instruction occupied by the 5 dispersa i b i oc k 0 f FIG. 3 according to the present invention 

identified word, generating a delimiter code for each iden- and a line of mstruction cacnC| a i ignrnem i ogi c, alignment 

tided word, where the delimiter code is set to identify a buff ™ ansion logic and dis pe rsed instruction buffer of 

boundary between the words of the each instruction and the y 

words of an adjacent instruction, and storing each identified ' * _ . , t , ^ 

word along with the corresponding dispersal and delimiter 1Q . FIG - 7 * a blo f k diagram of a VLIW computer system 

codes. This method results in smaller programs which 10 incorporating Jhe line of instruction cache, alignment logic, 

reduces the amount of disk and main memory usage in a * h S n ™ nt buffc '' ex P a ™° n lo S ,c and dls P ersed instructl0n 

computer. The method may also be applied to an on-chip buffer i«^ atcd in ^G. 6 - 

cache memory which stores instructions in the same format FIG 8 * a block diagram illustrating an embodiment of 

resulting in better utilization of on-chip cache memory. a dispersal circuit according to the present invention in 

The present invention can be implemented in a processor which lhe bils of the compacted instructions are interleaved, 

comprising a plurality of functional units and having a buffer FIG. 9 is a schematic block diagram of a VLIW computer 

for storing compacted instructions, each compacted instruc- system incorporating the bit interleaved dispersal circuit of 

tion including words containing an operation code, a dis- FIG. 8. 

persal code, and a delimiter code. Alignment means parses 2 o DETAILED DESCRIPTION 
each one of the compacted instructions from the buffer based 

upon the delimiter codes of the words of each compacted A block diagram of a computer system according to the 

instruction. An alignment buffer stores each compacted present invention, illustrating portions required for storing 

instruction after the compacted instruction is parsed from the and expanding wide instruction words and for executing 

buffer means. A dispersed instruction buffer stores an 2 s w *de instruction words, is shown in FIG. 3. A program 

executable instruction, wherein the executable instruction counter 200 provides successive instruction addresses of a 

includes an operational field corresponding to each one of program being executed to instruction cache 100 through an 

the plurality of functional units. Dispersal means transfers address buffer 202. Instruction addresses are also provided 

each word of the compacted instruction stored in the align- to a cache refill state machine 204, an adder 206 which 

menl buffer into at least one operational field of the execut- 30 increments program counter 200, and to a comparator 208. 

able, instruction responsive to the dispersal code correspond- The cache refill state machine 204 controls refilling of 

ing to the word. instruction cache 100 from instruction memory 110 when a 

A further aspect of the present invention is a bit inter- cache miss occurs. A cache miss occurs when the required 

leaving technique for the storage of instructions in on-chip instruction word, i.e. the instruction word specified by the 

cache memory wherein the bits of an instruction word are 35 instruction address, is not present in instruction cache 100. 

interleaved when the instruction is fetched from memory. Dispersal block 210 controls expansion of compressed 

Thereafter, the identified words of the instruction are parsed instruction words stored in instruction memory 110 to 

from those of adjacent instructions and dispersed in inter- expanded instruction words for storage in instruction cache 

leaved format. The instruction words are then de-interleaved 100. 

before being distributed to their corresponding functional 40 When the expansion of the instruction word is complete, 

units. This makes the instruction encoding method more it is transferred into instruction cache 100. Instruction words 

efficient for large instruction widths without an area and are transferred from instruction cache 100 to an instruction 

speed penalty in the dispersal of instructions to functional buffer 220 and then to an instruction decoder 222. The 

units. This latter aspect of the invention is not limited to instruction decoder 222 decodes each operation in the 

VLIW machines, but also applies to any processor with 45 instruction word and provides control signals to execution 

multiple functional units. units 230. In a processor which utilizes wide instruction 

The foregoing and other objects, features and advantages words, execution units 230 may include two or more arith- 

of the invention will become more readily apparent from the metic units and/or multipliers for parallel computation, 

following detailed description of a preferred embodiment of Furthermore, the execution units 230 may access a data 

the invention which proceeds with reference to the accom- 50 cache and/or memory in parallel with the computations. In 

panying drawings. the case of a branch instruction, the execution units 230 may 

BRIEF DESCRIPTION OK THE DRAWINGS f"P p ' y ** T ^ ^ " branCh ° ddrCSS ' 

thereby overriding normal program sequencing. The com- 

FIG. 1 is a schematic diagram of a conventional VLIW parator 208 compares the instruction address with the cache 

with syllables that correspond to functional units of a VLIW 55 tags 120 to determine if the required instruction word is 

processor. present in instruction cache LOO. 

FIG. 2 is a schematic representation of an instruction fig. 4 illustrates an example of an embodiment of the 

cache and a memory in a conventional processor, illustrating compacted instruction format of the present invention 

storage of wide instruction words in compressed format. wherein each syllabic of compacted instruction word 400 

FIG, 3 is a block diagram of a VLIW computer system 60 also contains a set of dispersal bits. Compacted instructions 

according to the present invention which reads and expands have at least a portion of the syllables in the VLIW corre- 

compacted VLIWs for execution. spending to NOPs eliminated. Syllables SI, S3, S5 and S6 

FIG. 4 is a schematic diagram illustrating the format of a of compacted instruction 400 contain operation codes that 

compacted instruction according to the present invention in are not NOPs and, in addition, dispersal bit sets Dl, D3, D5 

the instruction memory of the computer system of FIG. 3 65 and D6 respectively. Each syllable's dispersal bits encode 

and its relationship to a dispersed instruction in the instruc- the functional unit where the syllable is to be executed. After 

tion buffer of FIG. 3. a compacted instruction is fetched, dispersal hardware uses 
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the dispersal bits for each syllable to route the syllable to its memory by the processor on a line basis each time a memory 

corresponding functional unit. This frees syllables from access is made. Therefore, only complete , instructions are 

being restricted to a fixed location in a compacted insiruc- contained in each line. Compacted instructions can be per- 

tion. mitted to straddle line boundaries by splitting the instruction 

In general, the truth table performing a mapping of D to s cache into multiple banks and adding logic to fetch multiple 

F might be different for different syllables S in the com- consecutive instruction lines. However, the number of inter- 

pacted instruction. For example, Dl-000 might encode SI co " nec l. wires required is also multiplied and additional 

should be dispersed to Fl, while the same value for D3 p "L fetch loglC 15 . . u 

might encode that S3 is to be dispersed to F2. The syllables ... F f l ; a ""P"** instruction 604C in the example 

in a compacted instruction generally, but no. necessarily, do 10 AbMn ^^. ^ ™ d T£ H &0m b f* : ?° ^ 

h mhp u P° n tne delimiter bits of the individual words of the 

noi nave any jnuks. compacted instructions in line buffer 610. Extraction of a 

An example of the delimiter encoding scheme of the compacted instruction can take place from any instruction 

present invention is shown in FIG. 5. In addition to a worc j position in buffer 610 using the delimiter bit scheme 

dispersal bit set, each syllable of the compacted instruction described above. In the illustrated example, beginning with 

also includes delimiter encoding to indicate the boundaries 15 the slart bit of instruction word W2, the instruction 604C is 

between sequential compacted instructions. This encoding is parse d from buffer 610 by alignment logic 620, aligned and 

in the form of a single bit included in each syllable called a slore d in alignment buffer 630. 

start bit. Y^e instruction words in alignment buffer 630 are then 

In the start bit scheme, the bit is set to one value for the expanded to dispersed instruction buffer 650 based upon the 

first syllable in a compacted instruction and to the opposite d bits associated with each word. For example, the D bits of 

value for all other syllables of the compacted instruction. W2 indicate that W2 is to be dispersed to the dispersed 

Alternatively, a slop bit scheme may be employed wherein instruction buffer field 2 which corresponds to FU2. The 

the slop bit is set to one value for the last syllable in a fields of dispersed instruction buffer 650 have a one to one 

compacted instruction and to the opposite value in all other correspondence with the functional units. The D bits for 

syllables of the compacted instruction. eacn instruction word in alignment buffer 630 can indicate 

For example, syllable SI of compacted instruction 1 in that the associated instruction word is to be distributed to 

FIG. 5 includes dispersal bits Dl and S-bit 502 which is set one or more fields of dispersed instruction buffer 650. The 

to indicate the start of compacted instruction 1. In contrast, fields of dispersed instruction buffer 650 that do not receive 

S-bits 504, 506 and 508 corresponding to syllables S3, S5 3Q instruction words from alignment buffer 630 are loaded with 

and S6 respectively are clear to indicate that they are part of NOPs. 

the same compacted instruction indicated by S-bit 502. S-bit If line buffer 610 holds L instruction words, the alignment 

510 corresponding to syllable S7 of compacted instruction 2 function of alignment logic 620 for loading alignment buffer 

is also set indicating the end of compacted instruction 1 and 630 requires an L position shifter circuit. Each instruction 

the beginning of compacted instruction 2. 35 word has a predetermined number of bits B. Each of the B 

The delimiter encoding scheme described above is uti- bits of a word are shifted as a group, without differentiation, 

lized by an instruction sequencing method and apparatus to by the same amount. Alignment buffer 630 holds from 1 to 

control the fetch of each successive instruction from N syllables, where N is the number of functional units. The 

memory storage. An example of such an instruction word positions in alignment buffer 630 arc numbered from 

sequencing method and apparatus is disclosed in the 40 1 to N. The alignment buffer 630 can be constructed to hold 

commonly-assigned patent application titled METHOD more than N words, but these extra words are ignored 

AND APPARAFUS FOR STORING AND EXPANDING because no more than N words can be dispersed in a cycle. 

PROGRAMS FOR WIDE INSTRUCTION WORD PRO- The B bits of W2 in field 1 of alignment buffer 630 can come 

CESSOR ARCHITECTURES, Ser. No. 08/767,450, the from any of the L word fields in line buffer 610. This 

disclosure of which is herein incorporated by reference. 45 requires a traversal in alignment logic 620 of a B bit bus 

FIG. 6 is a diagram showing the extraction and dispersal from tne furthest word position in line buffer 610 to field 1 
of compacted instructions in a processor with instructions in alignment buffer 630. The height of alignment logic 620 
encoded as described above with regard to dispersal block is therefore proportional to the product L*B. 
210 of FIG. 3 and which also describes the function of a line Expansion logic 640 has connections that allow an 
of instruction cache 710, alignment logic 720, alignment 50 instruction syllable in field 1 of alignment buffer 630 to be 
buffer 730, expansion logic 740 and dispersed instruction routed to any of the fields 1 through N of dispersed instruc- 
buffcr 750 of FIG. 7. The function of dispersal block 600 is tion buffer 650 for routing to any of the functional units FU1 
to transform a compacted instruction into a fixed length through FUN. An instruction syllable in field 2 of alignment 
dispersed instruction which is output on interconnect wires buffer 630 is routable to fields 2 through N of dispersed 
which are routed to functional units FU1-FUN. From a 55 instruction buffer 650. The remaining fields of alignment 
functional standpoint, a sequence of compacted instructions, buffer 630 are similarly dispersable via expansion logic 640 
such as 604A, 604B, 604C and 604D, are read in from to the fields of dispersed instruction buffer 650. The longest 
memory and arranged in line buffer 610 that is filled from signal run in expansion logic 640 has a length proportional 
instruction memory, which also represents a line of instruc- to N*B. The height of expansion logic 640 is also propor- 
tion cache 710 for the embodiment of FIG. 7. 60 lional to N*B. 

It should be noted that compacted instructions are stored For large values of L, N and B, alignment logic 620 and 

in memory on the basis of the line length of the instruction expansion logic 640 require interconnect wires that travel 

cache or line buffer. The assembler that assembles the long distances and occupy a large area of the chip. This is 

operation code for the processor will fit as many complete detrimental to the performance and cost of the resulting 

sequential compacted instructions as will fit in the line 65 circuit. 

length and will pad any word spaces in the line that do not 'I tie number of D bits in each instruction word is depen- 

contain an instruction word. Instructions are fetched from dent on the number of functional units in the processor and 
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the desired dispersal range of each instruction syllable (i.e. 
the range of functional units to which an instruction syllable 
may be distributed based upon the D bits of the instruction 
word). One approach is to require that each syllable be 
dispersable to any of N functional units. In this case, the 
number of bits in D is log 2 (N), rounded up to the next 
integer. Therefore, if there are eight functional units (N-8), 
then three D bits are required per instruction word to 
disperse the instruction syllable to all eight functional units. 

Another approach is to restrict each syllable to being 
dispersed to a subset of functional units in order to reduce 
the complexity and size of expansion logic 640 and reduce 
the size of each instruction word. For example, each syllable 
may be limited to dispersal to four "nearby" functional units. 
In this case, the length of D would be 2. This second 
approach will not allow all NOPs to be eliminated from the 
compacted instructions. 

FIG. 7 illustrates another computer system adapted to 
utilize the compacted instructions described above. The 
computer system 700 of FIG. 7 is similar to the computer 
system of FIG. 3 except that computer system 700 stores the 
VLIW instructions in an instruction cache 714 in the same 
compacted form as the instructions in instruction memory 
700. This makes better use of instruction cache 714 by 
reducing or eliminating the NOPs stored in the cache. 
Storing instructions in compacted form in instruction cache 
714 has the additional advantage of keeping the instruction 
cache addresses the same as the instruction memory 
addresses. However, the expansion and dispersal functions 
described in connection with dispersal logic 600 of FIG. 6 
are required to operate in the processor pipeline after the 
fetch stage for a compacted instruction. 

As described in the context of dispersal logic 600 of FIG. 
6 above, words from instruction cache 710 are transferred to 
alignment buffer 730 through alignment logic 720 which 
parses the compacted instruction from the instruction cache 
line and aligns the compacted instruction with its start bit. 
Expansion logic 740, based upon the D bits for each 
instruction word, then disperses operation codes of the 
instruction words in alignment buffer 730 to the fields of 
dispersed instruction buffer 750. Dispersed instruction 
buffer 750 has a field corresponding to each of the functional 
units in block 760 and those fields which do not receive an 
operation code syllable from an instruction word in align- 
ment buffer 730 are filled with a NOP code. 

To reduce the length of interconnect wires and reduce the 
chip area required by the dispersement logic 600 shown in 
FIG. 6, the present invention can be implemented with a bit 
interleaved dispersement logic 800 as shown in FIG. 8. 

When an instruction line is fetched from external memory, 
the instruction words in the fetched line are bit interleaved 
by inter leaver logic 880 before being stored in bit inter- 
leaved instruction cache 810. The instructions in external 
memory are kept in their conventional form with bits within 
each instruction word being adjacent to one another. The 
instruction words are interleaved when they are brought on 
to the chip from external memory, either for storage in 
instruction cache 810 or into a line buffer. The bits of each 
of the instruction words in the fetched line are interleaved in 
instruction cache 810. 

In most situations the number of bits B per instruction 
word is a power of 2, i.e. 16, 32, 64. The number of syllables 
L in a line buffer or line of instruction cache 810 is also 
typically a power of 2. However, this is not strictly 
necessary, and the bit interleaving scheme simply relies on 
each instruction word being the same size. I"he line buffer 
has B groups of L bits per group, each group is called a 
'bitgroup'. 
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In instruction cache 810, the bit 0 of all the instruction 
words are in bit group 0, the bit 1 of all the instruction words 
are in bit group 1 and so on for each bit up to bit (B-l), where 
B is the size of the instruction words. Each bit group is 
therefore L bits in size, where L is the line length of the 
instruction cache 810. Bit 0 of all the instruction words of 
the fetched line are physically adjacent to each other in 
instruction cache 810, alignment logic 820, alignment buffer 
830, expansion logic 840 and dispersed instruction buffer 
850 which are all bit interleaved. The same scheme applies 
to each of the other bit groups as well. 

Since the first bit of each instruction word is the start bit, 
bit group 0 becomes a collection of the start bits of all the 
instruction words in the buffer and can be used to drive the 
alignment logic 820 in parsing and aligning each instruction 
in the fetched line. For example, a bit interleaved compacted 
instruction can be aligned by simply left shifting all the bit 
groups until a bit set to 1 is found in bit group 0. 

The D bits for each instruction word are then used to 
disperse the bit from that instruction word in each bit group 
in aligned buffer 830 to the appropriate bit group field in 
dispersed instruction buffer 850. Each functional unit in 
block 860 has a corresponding bit in each bit group field of 
dispersed instruction buffer 850. The bits in each bit group 
field in dispersed instruction buffer 850 which are not 
populated with a bit from the aligned buffer 830 are popu- 
lated with a bit value that results in a NOP code being sent 
to the corresponding functional unit in block 860. 

The expanded instruction in dispersed instruction buffer 
850 is then de-interleaved such that intact, non-bit- 
interleaved, operation codes then arrive at functions unit 
860. All the bits in an instruction word that need to be sent 
to a given functional unit need to be gathered adjacent to 
each other at the functional unit. This de-interleaving is 
relatively simple because it docs not involve any control 
mechanism. This means that a signal run in a certain location 
is routed to another location without any third signal con- 
trolling either the starting or ending locations. There are no 
multiplexers or active circuitry involved in the 
de-interleaving function, it is simply a routing of intercon- 
nect wires which can even be achieved by a simple "bend" 
in the complete set of interconnect wires, 'llierefore the area 
overhead of de-interleaving network 870 is small. There is 
a net benefit of total area reduction. 

The result of bit-interleaving is that the alignment func- 
tion of alignment logic 820 is performed on each bit-group 
in parallel. The alignment amount is the same for each bit 
group. There is no need for the instruction bits at one end of 
the line buffer to travel the full line length of the line buffer 
or instruction cache 810. The bits simply travel within their 
bit-groups. The maximum distance traversed by any instruc- 
tion bit is therefore proportional to L rather than L*B, a 
savings of a factor of B. This yields a savings of a factor of 
B in the height of interconnection area for alignment logic 
820. 

There are similar savings in expansion logic 840. Within 
each bit-group, the bits are dispersed to single -bit slices of 
the N functional units. The bits within each bit-group then do 
not need to be sent over long distances. The distance is 
proportional to N rather than N*B, a savings of a factor of 
B. The dispersal of all bit-groups occurs in parallel, 'litis 
yields a savings of a factor of B in the height of intercon- 
nection area for expansion logic 840. 

A computer system using the bit interleaved logic of FIG. 
8 is illustrated in FIG. 9. Lines read in from instruction 
memory 902 by cache refill slate machine 904 are bit 
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interleaved by interleaver logic 980 and stored in interleaved 
format in instruction cache 910. A compacted instruction in 
bit interleaved format is read out of instruction cache 910, 
aligned in bit interleaved format and aligned by alignment 
logic 920 for storage in alignment buffer 930. Expansion 
logic 940 then disperses the bits from the instruction words 
of the compacted instruction in alignment buffer 930 and 
stores the bits of the expanded instruction in dispersed 
instruction buffer 950. The expanded instruction is then 
reassembled in non-interleaved format by de-interleaver 
network 970 so that full instruction codes are received by 
functional units 960. 

Having described and illustrated the principles of the 
invention in a preferred embodiment thereof, it should be 
apparent that the invention can be modified in arrangement 
and detail without departing from such principles. I claim all 
modifications and variations coming within the spirit and 
scope of the following claims. 

VVc claim: 

1. A method for storing and decoding instructions for a 
microprocessor comprising the steps; 

identifying each word of one the instructions that does not 
contain a NOP code; 

generating a dispersal code for each identified word, the 
dispersal code corresponding to a field of the instruc- 
tion occupied by the identified word; 

generating a delimiter code for each identified word, the 
delimiter code being set to identify a boundary between 
the identified words of the instruction and the identified 
words of an adjacent instruction and the delimiter code 
being otherwise clear; 

storing each identified word along with the corresponding 
dispersal code and the delimiter code in a compressed 
instruction, said compressed instruction lacking dis- 
persal codes for each word of said instruction that 
contains a NOP. 

2. The method of claim 1 including: 
fetching the compressed instruction; 

parsing the identified words of the compressed instruction 
from those of the adjacent instruction based on the 
delimiter code; 

restoring each identified word to the field of the instruc- 
tion indicated by the corresponding dispersal code; and 

placing NOP codes in each field of the instruction that 
does not contain an identified word; and 
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distributing each instruction word to a corresponding 
functional unit. 

3. The method of claim 2, wherein each dispersal code 
identifies a corresponding field on the instruction. 

4. The method of claim 2, wherein each dispersal code 
identifies one of the functional units, said one of said 
functional units being identified by its position in a sequence 
of functional units, 

5. The method of claim 4, wherein the position of the 
functional units corresponding to the dispersal code is 
relative to a position of the instruction word in the instruc- 
tion after parsing. 

6. The method of claim 2 wherein: 

the step of fetching the instruction includes interleaving 
the bits of the instruction words of the instruction; 

the step of parsing the identified words of the instruction 
from those of the adjacent instruction based on the 
delimiter codes includes the step of parsing the bits of 
the identified words in interleaved format; 

the step of dispersing each identified word to the field of 
the instruction word indicated by the corresponding 
dispersal code includes dispersing the bits of each 
identified word in interleaved format; 

the step of placing NOP codes in each field of the 
instruction word that does not contain an identified 
word includes placing NOP codes in the bits of each 
field of the instruction word that does not contain an 
identified word in interleaved format; and 

the step of distributing the instructions words to said 
corresponding functional unit includes de -interleaving 
the bits of the instruction word before distributing the 
instruction words to said corresponding functional unit. 

7. The method of claim 6, wherein the dispersal code 
identifies one of a plurality of fields of the instruction. 

8. The method of claim 6, wherein each dispersal code 
identifies one of the functional units, said one of said 
functional units being identified by its position in a sequence 
of functional units. 

9. The method of claim 8, wherein the position of the 
functional units corresponding to the dispersal code is 
relative to a position of the instruction word in the instruc- 
tion after parsing. 
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